Adaptative Hausdorff Distances and Dynamic Clustering of Symbolic Interval Data
نویسندگان
چکیده
This paper presents a partitional dynamic clustering method for interval data based on adaptive Hausdorff distances. Dynamic clustering algorithms are iterative two-step relocation algorithms involving the construction of the clusters at each iteration and the identification of a suitable representation or prototype (means, axes, probability laws, groups of elements, etc.) for each cluster by locally optimizing an adequacy criterion that measures the fitting between the clusters and their corresponding representatives. In this paper, each pattern is represented by a vector of intervals. Adaptive Hausdorff distances are the measures used to compare two interval vectors. Adaptive distances at each iteration change for each cluster according to its intra-class structure. The advantage of these adaptive distances is that the clustering algorithm is able to recognize clusters of different shapes and sizes. To evaluate this method, experiments with real and synthetic interval data sets were performed. The evaluation is based on an external cluster validity index (corrected Rand index) in a framework of a Monte Carlo experiment with 100 replications. These experiments showed the usefulness of the proposed method.
منابع مشابه
Adaptive Hausdorff distances and dynamic clustering of symbolic interval data
This paper presents a partitional dynamic clustering method for interval data based on adaptive Hausdorff distances. Dynamic clustering algorithms are iterative two-step relocation algorithms involving the construction of the clusters at each iteration and the identification of a suitable representation or prototype (means, axes, probability laws, groups of elements, etc.) for each cluster by l...
متن کاملHausdorff Distance Measure Based Interval Fuzzy Possibilistic C-Means Clustering Algorithm
Clustering algorithms have been widely used artificial intelligence, data mining and machine learning, etc. It is unsupervised classification and is divided into groups according to data sets. That is, the data sets of similarity partition belong to the same group; otherwise data sets divide other groups in the clustering algorithms. In general, to analysis interval data needs Type II fuzzy log...
متن کاملClustering Interval-valued Data Using an Overlapped Interval Divergence
As a common problem in data clustering applications, how to identify a suitable proximity measure between data instances is still an open problem. Especially when interval-valued data is becoming more and more popular, it is expected to have a suitable distance for intervals. Existing distance measures only consider the lower and upper bounds of intervals, but overlook the overlapped area betwe...
متن کاملMultidimensional Interval-Data: Metrics and Factorial Analysis
Statistical units described by interval-valued variables represent a special case of Symbolic Objects, where all descriptors are quantitative variables. In this context, the paper presents two different metrics in R for interval-valued data that are based on the definition of the Hausdorff distance in R. Hausdorff distance in R (for any p ≥ 1) is a L∞ norm between pairs of closed sets. However,...
متن کاملFuzzy Kohonen clustering networks for interval data
The Fuzzy Kohonen Clustering Network combines the idea of fuzzy membership values for learning rates. It is a kind of self-organizing fuzzy neural network that can show great superiority in processing the ambiguity and the uncertainty of data sets or images. Symbolic data analysis provides suitable tools for managing aggregated data described by intervals. This paper introduces Fuzzy Kohonen Cl...
متن کامل